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Understanding the mechanism of protein sec- 
ondary structure formation is an essential part of 
protein-folding puzzle. Here we describe a simple 
model for the formation of the /3-hairpin, moti- 
vated by the fact that folding of a /3-hairpin cap- 
tures much of the basic physics of protein fold- 
ing. We argue that the coupling of "primary" 
backbone stiffness and "secondary" contact for- 
mation (similar to the coupling between the "sec- 
ondary" and "tertiary" structure in globular pro- 
teins), caused for example by side-chain packing 
regularities, is responsible for producing an all- 
or-none 2-state /3-hairpin formation. We also de- 
velop a recursive relation to compute the phase 
diagram and single exponential folding/unfolding 
rate arising via a dominant transition state. 



I. INTRODUCTION 

Recent years have seen an intense effort aimed at elu- 
cidating the physics underlying protein folding One 
crucial question concerns the nature of the transition 
from the random coil to the native conformation. Essen- 
tially, we wish to discover the critical parameters govern- 
ing whether this transition is first-order ( "all or none" ) or 
continuous and furthermore we wish to characterize the 
transition kinetics. In this paper, we focus on these issues 
in a very simple context, the formation of a /3-hairpin. 

For the past forty years, the a helix-coil transition has 
been extensively studied Here, the transition is in 
general continuous rather than abrupt; hence there is no 
2-state behavior. In comparison, however, a recent exper- 
iment |^] has shown that /3-hairpin formation can exhibit 
a 2-state collective behavior between the random coil (un- 
folded) and native hairpin (folded) states. Recent compu- 
tational studies have concluded that hydrogen-bond 
formation between the two sides of the hairpin is insuf- 
ficient to produce an all-or-none 2-state behavior. In- 
stead, one must also take into account the hydrophobic 
side chain packing regularities. Translated into the lan- 
guage of simple models, one would therefore expect that 
a simple pairwise Go-like interaction would not give rise 
to an all-or-none transition; instead, one must add addi- 



tional terms corresponding to coupling the "secondary" 
structure (the contact formation between residues on the 
opposite sides of the hairpin) with the "primary" struc- 
ture (the stiffness of the backbone). 

The purpose of this paper is to introduce a simple, ex- 
actly solvable model which allows one to calculate the 
equilibrium states and the transition kinetics of a model 
with this type of coupling. We describe the hairpin by 
two Gaussian chains (attached at the turn of the hair- 
pin) whose interaction is described by two types of terms. 
There is a pairwise Go-like interaction mimicking the 
hydrogen-bond formation and a short ranged many-body 
interaction approximating the side chain packing regular- 
ities. To simplify our model, we assume that the hydro- 
gen bond formation and side chain packing regularities 
are uniformly distributed among the residues. This al- 
lows us to develop a set of recursion relations for the 
exact determination of the partition function, and show 
the range of parameters for which the hairpin has 2-state 
behavior. Finally, we can estimate the (single exponen- 
tial) folding/unfolding rate via calculating the thermo- 
dynamic weight of the "critical" droplet/bubble. 



II. THE MODEL HAMILTONIAN 

We consider a hairpin polymer composed of two in- 
teracting Gaussian chains (labeled as branch 1 and 2) 
connected by a j3 turn at the proximal end, labeled as 
sequence index i = in fig.|l|(a). To have a unique native 
structure, we impose iV-pairwise Go interactions on this 
polymer, which mimic the hydrogen bonds formed by the 
2N residues. Our approach assumes that one can write 
down an effective Hamiltonian in terms of the spatial co- 
ordinates x\ k ^ (i is residue index counted away from the /? 
turn and the superscript k = 1, 2 stands for branch label- 
ing) of these Go interacting residues. The non-interacting 



part of this Hamiltonian is simply ^2 k=1 2 -^GaL 
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Here Vx^ — x- K> — x\'^ 1 is the vector connected nearest- 
neighbor residues on each chain, and k is backbone stiff- 
ness. 

The second ingredient of the Hamiltonian is the inter- 
chain interaction. As already discussed, we use the Go in- 
teraction to mimic the hydrogen bond formation. These 
bonds result from a short-distance proximity between the 
donor and acceptor residues (such that the water or ion 
molecule serving as counter-ion shielding can be squeezed 
out). Having a solvable model requires us to assume that 
the binding strength is uniformly distributed along the 
chain. This leads to the form VkbG^I) = — ViA(|a?j|) = 
— ViAj with Vi > 0, where A, = 1 if the inter-residue 
distance \xi\ = \x^ — x^\ falls into an effective attrac- 
tion window \xi\ < ro and otherwise. This "box" ap- 
proximation to the potential has also been used by other 
groups §-§. 
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FIG. 1. (a) A j3 hairpin polymer with two interacting 
branches. Each branch contains N residues with Go-like in- 
ter-chain hydrogen bond interactions (Vi, the dashed line) 
and with cooperative side chain packing terms (V2, the dashed 
box). Here only the Go- interacting residues are shown. The 
degrees of freedom of those non-Go-interacting residues are 
assumed to be "renormalized" into those Go-interacting ones, 
(b) The hairpin zippered from the /3 turn. In the folding (un- 
folding) regime, the droplet (bubble) will expand. 
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terms of the inter-residue distances only. The remaining 
Gaussian component will be 
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Without loss of the generality, we assume that the j3 turn 
is fixed at xq = and drop the Gaussian component 
\xjj ^ — x^\ 2 . This should not affect our result signifi- 
cantly if the system size is large enough. The interaction 
component is 



k=2 
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We now proceed to find the partition function for this 
model. Since we are interested in the case of short-ranged 
Go interaction, (i.e., the effective contact distance r is 
far smaller than the thermodynamic mean inter-residue 
distance 1/^/JJk), it is reasonable to replace the Gaussian 
connectivity term f |Va?fe| 2 by f |xfe| 2 if a?k-i is in a con- 
tact position. This approximation will greatly simplify 
the calculation. 



III. THE PHASE DIAGRAM 



The final effects we consider arise from the (hydropho- 
bic) side chain packing effect. This interaction depends 
both on the formation of inter-chain contacts and on the 
alignment of the local backbone dJ^]. In principle, there 
are two separate pieces that one can add to our model to 
mimic this interaction. First, the presence of one hy- 
drogen bond might, via the local structural dynamics 
(such as squeezing out water molecules), cooperatively 
help other neighboring residues form contacts. Such a 
collective term has the form Hs c = — V% Ylij AjAj ■; 
as always, Aj indicates the contact formation of residue 
pair i, and in addition is a coupling function indicat- 
ing the range of cooperativity. The simplest short-range 
assumption is that is 1 when i,j are nearest-neighbors 
and otherwise; one might also imagine a longer-ranged 
hierarchical side packing scheme such as that considered 
by Hansen et al. M|; here we stick to the simplest pos- 
sibility. This new term reflects a coupling from the pri- 
mary structure to the secondary one. In addition, form- 
ing a contact limits the conformations available to the 
chain; in our Gaussian model, this corresponds to an in- 
crease in the chain stiffness between i and i + 1 via a term 



Given the above simplification, our model can be ex- 
actly solved in terms of a set of recursion relations. We 
are eventually interested in the full partition function, 
which reads 
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where d is the dimensionality which we will take as 3. 
It will also be convenient to consider several "restricted" 
partition functions, corresponding to summing over all 
the states consistent with some extra constraints. First, 
we define Z^^ nj as the restricted partition function of an 
ensemble that contains all configurations specified by a 
particular number of contacts number rif. The full par- 
tition function is just a sum of Z^ tnj over this contact 
number. Second, we define Z^ as the restricted parti- 
tion function of an ensemble specified by contact number 
nt with the polymer distal end i = N being in a contact 
position (as indicated by the superscript "c" ) . 

Finally, we introduce the partition function for a com- 
pletely unfolded polymer segment running from sequence 
index j to sequence index i, weighted by additional terms 
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on the two end residues. Explicitly, the path integral of 
the unfolded segment takes the form 
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In Fig.|[ we show how a variety of path integrals, which 
form the building blocks for the entire partition function, 
can be represented in terms of Mi j (fix, fM2). This notion 
allows to break up the partition function into a sum of 
partition functions for smaller subsystems. Specifically, 
we have for nj > 0, 

N-l 1 
Z N ,n f = Z C N , nf + 2J M N<k+ x{00, Y^—) Z k,n s ( 5 ) 



and for 71/ > 1 



7 C 



3 m z 



AT— l,n/— 1 



(6) 



N-2 

£ 

k—n t — 



M 



N-X,k+X\ 



1+7 1+7 



k,nt — X 



with q = - 



— 4tt 3 
3 r 



•C 1. These are supplemented 



by the boundary conditions Mjj(/zi, M2) = if j > i, 
Z/v,o — -Wjv,i(oO)1) (the coil state with nf = 0), and 
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is straightforward to obtain Zn.n — [ge^ 1/l+V2 * ) ] Ar ; also, 
-Zjv.o is always less than unity. These will be used later. 
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FIG. 2. Using the Mij(/zi, /«) functional to represent 
the path integral of several diagrams, (a) For a polymer seg- 
ment starting from an "open" distal end N and ended at 
a "contact" residue i > 0, without any residue between N 
and i being in contact position, the path integral is equal to 
Mjv,i+i(oo, yq^;). Here the contact energy gain from residue 
i is not included, (b) For a polymer segment staring from the 
proximal end and ended at a "contact" residue i, without 
any residue between N and being in contact position, the 
path integral reads Mi_x,i(i^ 3 !)■ (c) For polymer segment 
starting from one contact residue i to another one j < (i — 1), 
without any residue between i and j being in contact position, 
the path integral is Mj_ij+x(^— , j^r)- (d) For a polymer 
segment only containing two contiguous contact residues, the 
path integral reduces to 1. (e) If the entire polymer is "open" , 
i.e., none of the residues are in contact position, the path in- 
tegral equals Mjv,i(oo, 1). 



The final step of our solution involves finding a recur- 
sive formula for Af,j (mi, ^2)- Recall that we have as- 
sumed that the contact distance r is far smaller than 
the thermodynamic inter-residue distance 1 / \/]3k and 
that therefore we can replace the Gaussian connectiv- 
ity @£\Vxk\ 2 by 4p|a?fc| 2 if x k -x is in contact position. 
If we consider a particular i,j pair, we can integrate out 
the coordinate of the i residue; this will yield the two 
terms 



Mij(^i,/i 2 ) 
Mi 
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Mi 



Mi-xj(^x + 1,M2) - qMi-ij (1,(43) 



The terms correspond to whether this residue is not or 
is in contact. Again, this must be supplemented by the 
boundary condition 



Mjj(nx, nn) = [MiM 2 /(/ii +H2)Y- 
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Using these recursion relations, we now can compute 
the thermodynamic probability of the native (nf = N), 
unfolded coil (rif = 0), and partially folded ensem- 
bles. The partition function is dominated by the highest 
probability state. An all-or-none transition will occur if 
this dominant state changes from coil to native (as the 
temperature is lowered) without passing through an ex- 
tended region of parameter space in which intermediate 
states dominate; otherwise, the transition will be con- 
tinuous. Our results indicate that at a fixed value of 
the (cooperative stiffness increase) 7 the transition be- 
tween coil and native states is 2-state-like as long as ^ 
is above some point Ce- Here Ce is the triple phase co- 
existing point (the coil, native, and one partially folded 
ensembles). This is shown in fig.^(a) for the case of 
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As 7 increases, the intermediate regime shrinks and 
eventually disappears. In other words, in order to obtain 
an all-or-none transition, we must have a minimal side 
chain packing strength V<2,min with respect to a particular 
set of Vx and 7; for large enough 7, V2, m in = 0. This 
behavior is plotted in fig.||(b). 

Although it is not relevant for the biological system, 
it is interesting from the general statistical physics per- 
spective to consider what happens to the minimal V2 as 
N gets large. If we define the folding temperature at the 
point Ce as T c — l/ksPc, then V£, m *n and j3 c satisfy 
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where "maxjii,^, ■■■,}" picks up the maximal one in 
the set {xx,X2 ■ ■ ■}■ This leads to Zn$\p c — Zn,n\p c > 
Zn,n-i\Pc- From the recursion relation (||), (^|), we 
find that the "large N" components in Zn.n-x is 
(A-2)[ge' 3Vl ] Ar - 1 e( Ar - 2 )' 3v ' 2 M fe , fc ( T ^, -^) with the one- 
residue loop entropy M k ,k described by eqn. ([?]). Combin- 
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ing this with the previous formula for Zn.n, the above 
inequality requires 



1 > q c e 
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with q c « 4^rg [{3 c k/2tt]2 <c 1. Clearly, this implies 
that at large N limit, the surface tension penalty (aris- 
ing from 7 and V2) must be large enough to compete 
with the combinatory entropy effect. At small 7, the 
only possible behavior consistent with the above inequal- 
ity is T c ~ TV 2 / 3 and V 2 , m in ~ A 2 / 3 In A. If 7 is 
large, it appears that there is another choice, namely the 
vanishing of the one-residue entropy, Mk.k(j^, j^) ~ 
0(1/N). In this case, we find that T c approaches a con- 
stant k B T c « 0.83r 2 (l + j)k + 0(l/N) and V 2 , min « 
k B T c ln[(l + 7 ) 3 / 2 - 1]] -Vi + 0{1/N). One should note 
however, that in this limit our original assumption that 
one can approximate factors of the form exp jnr^ as unity 
is no longer accurate, and the model crosses back to the 
requirement of an N dependent cooperative interaction. 
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FIG. 3. (a): The phase diagram for a N = 100, 7 = 1 
/3-hairpin. Here the side chain packing energy V2 and tem- 
perature ksT is scaled in the unit of the hydrogen bond 
energy V\. Above the point Ce, the polymer has a 2-state 
all-or-not transition. Below Ce is the "intermediate" regime 
(the shadow enclosed by the dashed line) where transition 
might be continuous. As 7 increases, Ce decreases toward 
zero and the intermediate regime eventually vanishes, (b): 
The minimal side chain packing energy V2,min required for 
an all-or-none 2-state folding transition, on the variation of 
backbone stiffness modulation 7 and hairpin size N (T c at Ce 
is also computed). 

To summarize, we find that 2-state behavior arising 
from short-ranged interactions can only exist in finite 
size hairpin systems. For such a finite system, increasing 
the stiffness of the bound parts of the chain will lead to 
this behavior at biologically relevant values of the chain 
length N. These results are consistent with those ob- 
tained in other two-chain models [pUll 



IV. THE FOLDING/UNFOLDING RATE 

We now compute the unfold/folding rate for side chain 
packing strength V2 > V^^min (he., the parameter regime 
above the triple-phase point Ce). The transition is 2- 
state-like, and the transition rate is of the Arrhenius form 
ktx ~ koe~P AF , where ko is a kinetic prefactor (which 
might be different for folding versus unfolding transi- 
tions), and the exponent AF is the free energy difference 
between the saddle point configuration (the transition 
state structure) and the metastable state. In our system, 
the Arrhenius form is just the thermodynamic proba- 
bility ratio between the transition (tx) and metastable 
ensembles, Z tx /Z meta ; here, Z meta = Z N>N if T > Tf, 
Zmeta = •Zjv.o if T < Tf and Tf is the folding tempera- 
ture defined by Zrv,o| r = ^iV,2v| T • 

Since there are 3N degree of freedom in our model, in 
general one can expect the existence of multiple saddle 
point configurations. Each configuration specifies a par- 
ticular "pathway" towards folding/unfolding. The sad- 
dle point configurations must be partially folded states 
and their contact residues could be inhomogeneously dis- 
tributed. This inhomogeneity and the number of con- 
tacts in turn determine the thermodynamic probabilities 
of these configurations. Among them, the most likely 
pathway is mediated by the configuration that has the 
maximal thermodynamic probability. Due to the surface 
tension effect (arising via nonzero 7) , only two particular 
structures need be considered, i.e., the polymer zipping 
from either the (3 turn or the distal end. As suggested by 
experiment ||,^| and by the simple logic that xq = bi- 
ases the polymer towards folding, the configuration that 
the "folding" droplet emerges from the /3 turn is the most 
likely one (fig.|l|(b)). If we define the restricted partition 
for this droplet (i.e., the polymer zipping from (3 turn to 
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sequence index i, < i < N) as Yf, from the recursive 
relation, we have 



Y k = 



qe 



P[Vi+V 3 ] 
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N,k+1 
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The transition state, therefore, is defined as the particu- 
lar state Ztx = Yk, with Yk satisfying Yk < Yu±i- 
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(10) (or T[° ld > T > T| oW ), on the other hand, the size of 
the "critical" unfolding bubble (folding droplet) is one. 
Here the "crossover" temperature T™/°' d is defined by 
Yn-i old = Yjv-i L«»/cid, and T/° ld is defined by 



Y 2 \ T jou = Y x \ T foi d . Finally, for T[° ld < T < T^f old , 
the critical unfolding bubble (if T > Tf) or folding 
droplet (if T < Tf) has a size k between 1 and N — 1, 
determined by variational method, Yk < Yk±\. 

For completeness, we also computed the folding and 
unfolding rates based on the configuration that the 
droplet is initiated from the distal end instead of the (3 
turn. It turns out that its folding/unfolding rate is 10 3 - 
fold lower than the previous configuration. This confirms 
our assumption and is consistent with the experimental 
observation that additional inter-chain interaction near 
the (3 turn enhances the folding rate |||| . Thus we con- 
clude that the dominant pathway for polymer folding is 
zippered from the (3 turn; whereas the pathway for un- 
folding is un-zippered from the distal end. 
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FIG. 4. (a): The Arrhenius component ln[Zt x /Z me ta] 
for folding/unfolding transition and the corresponding critical 
droplet/bubble size on the variation of temperature. Note the 
"crossover" regime (T > T™ fold and T < Tlf d ). (b): The 
appearance of crossover temperature T cx and end temperature 
Te is determined by the critical droplet/bubble size. When 
T = T u E nSold (the top one), Z n ,n = Y 99 . When T = T^ lfold 
(the second top one), Y99 = Ygg. Likewise, when T = 
(the second lowest one), Y 3 = Yi, and when T = T^ oid (the 
lowest one), Yi = Zn,o- The transition state is the lowest 
{lnYfe} on every curve. 

While computing the transition state, we find that for 
a particular set of the parameters (V2, V\, 7, ro), the 
behavior of the folding droplet or the unfolding bubble 
is characterized by several temperatures: T" n ' 



T 



unfold/ fold 



and T } (fig.|). For T > T] 



E 

unfold 
E 



cx , aim mj i^b-q/- iui ^e (or T < 

Tg° ), there is no transition state and the transition 
from native (coil) to coil (native) state is purely relax- 
ational. Here the "end" temperature T^ n ^ old is defined 



V. DISCUSSION 

We have analyzed a simple (3 hairpin model and dis- 
cussed the significance of side chain packing regularities. 
We found that side-chain packing regularities are nec- 
essary to generate 2-state transitions between coil and 
native structures. There will be an upper limit to the 
hairpin size N for which this behavior persists but this 
does not affect its applicability to biologically relevant 
finite-sized chains. Since it has been shown that fold- 
ing of a /3-hairpin captures much of the basic physics of 
protein folding B (imagine the folding of two a-helices 
connected by a (3 turn; the coupling between the "sec- 
ondary" and "tertiary" structure there is similar to the 
coupling between "primary" backbone stiffness and "sec- 
ondary" contact formation here), our model can provide 
a fundamental understanding of how a protein can fold. 

Why is the coupling induced by side chain packing im- 
portant? The reason is as follows. In general, the en- 
semble space of a hairpin polymer contains two entropic 
components: one is configurational regarding local loop 
entropy and another is combinatory indicating the total 
possible arrangement of the hydrogen bonds in a partially 
folded state. The combinatory part grows exponentially 
as the system size increases, for states which have a finite 
fraction of the possible hydrogen bonds. This might com- 
pensate the configurational entropy loss compared to the 
coil state and the relatively lower hydrogen-bond energy 
gain compared to the native hairpin state. The partially 
folded state then becomes thermodynamically predomi- 
nant, and the 2-state transition between coil and native 
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states will be destroyed. To avoid this situation, a col- 
lective effect must be imposed. For the on-lattice model, 
this needed effect can come from the restricted arrange- 
ment of the polymer residues, since the alignment of one 
part of the polymer will affect the others and the influ- 
ence ultimately reach the whole length of the chain [jl2| . 
On the contrary, there is no such effect in off-lattice mod- 
els; instead, one has to design the Hamiltonian carefully 
to obtain the desired 2-state behavior. 

It appears that the "side-chain packing" regularity is 
the essential ingredient to allow the all-or-none folding 
transition Ppp2|; this is perhaps similar to the ideas 
of "hydrophobic collapse" and "non-additive" force put 
forth by other groups The side-chain packing is 

essentially dependent on the "matching" of the local pep- 
tide backbone conformation and the hydrogen-bond for- 
mation P]. We have argued that this fact leads to new 
cooperative terms in the Hamiltonian, as having some 
residues hydrogen-bonded will bias other residues to form 
their native contacts and also locally restrict conforma- 
tional entropy. One critical consequence of this is the cre- 
ation of an effective "surface tension" between the folded 
and unfolded regimes. This energy cost will compete with 
the combinatory entropy gain of a partially folded state, 
and the all-or-none transition can be restored. This side- 
chain effect is not present in systems undergoing the a 
helix-coil transition, manifesting the essential difference 
between a-helix and /3-hairpin formation. 

Our model is, of course, greatly simplified compared 
to the actual /3-hairpin system. One immediate criticism 
is that we have neglected non-native hydrogen bond for- 
mation by using the Go interaction. To date, NMR evi- 
dence suggests H that states with non-native bonds have 
minimal thermodynamic weight, lending support to the 
adequacy of this approach. These neglected effects, how- 
ever, might produce local minima in the energy landscape 
and trap misfolded structures, leading to a glassy molten 
globule. 

Another problem is our use of uniform strengths for 
all interactions. This was necessary because of our desire 
to develop a recursion relationship which enables us to 
do calculations for reasonably high values of N. One can 
extend the model to heterogeneous couplings, but then 
one will have to resort to exact evaluation of each of the 
2 N different states of the Go contacts. This would limit 
us to small polymers; also, the cooperative "zippering" 
behavior in folding transition might be altered to start 
from the center of a hydrophobic cluster instead of the 
(3 turn [g. In any case, we do not believe that modest 
heterogeneity will lead to any significant changes in our 
results regarding the source of the cooperativity in this 
class of systems. 
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